Evolutionary stability and resistance to cheating in an indirect reciprocity model 

based on reputation 
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Indirect reciprocity is one of the main mechanisms to explain the emergence and sustainment of 
altruism in societies. The standard approach to indirect reciprocity are reputation models. These are 
games in which players base their decisions on their opponent's reputation gained in past interactions 
with other players (moral assessment). The combination of actions and moral assessment leads to a 
large diversity of strategies, thus determining the stability of any of them against invasions by all the 
others is a difficult task. We use a variant of a previously introduced reputation-based model that let 
us systematically analyze all these invasions and determine which ones are successful. Accordingly we 
are able to identify the third-order strategies (those which, apart from the action, judge considering 
both the reputation of the donor and that of the recipient) that are evolutionarily stable. Our results 
reveal that if a strategy resists the invasion of any other one sharing its same moral assessment, it 
can resist the invasion of any other strategy. However, if actions are not always witnessed, cheaters 
(i.e., individuals with a probability of defecting regardless of the opponent's reputation) have a 
chance to defeat the stable strategies for some choices of the probabilities of cheating and of being 
witnessed. Remarkably, by analyzing this issue with adaptive dynamics we find that whether a 
honest population resists the invasion of cheaters is determined by a Hamilton-like rule — with the 
probability that the cheat is discovered playing the role of the relatedness parameter. 

PACS numbers: 02.50.-r,87.10.-c,87.23.-n,89.75.Fb 



I. INTRODUCTION 

Human being is the social animal par excellence. An 
individual can help another even if it is the first time they 
meet or if they know that they will never meet again. 
Several mechanisms have been proposed to explain co- 
operation between unrelated individuals. Among them 
reciprocity, either direct or indirect, stands as one of the 
most successful explanations of altruism pj. In direct 
reciprocity individuals pay back the help received in re- 
peated encounters with the same partner ("I help you if 
you help me") @]. In society, however, many interactions 
have low chances to be repeated with the same individual. 
To explain altruism in those interactions, the concept of 
indirect reciprocity was introduced [J Q • Through this 
mechanism, individuals do not receive the consequences 
of their actions directly from the individuals they inter- 
act with, but indirectly through society ("I help others 
to be helped by others"). Indirect reciprocity is an im- 
portant mechanism for the emergence and sustainment 
of altruism not only in small-scale human societies 
but in other species as well And it certa inly plays 

an important role in communication networks [Tl|, [l2| ■ 

There arc two types of indirect reciprocity: upstream 
and downstream. In upstream reciprocity [l3l 1 1-41 ] an in- 
dividual opts for a given action taking into account if 
she was previously helped or not. In this respect up- 
stream reciprocity is more akin to a learning mechanism, 
because individuals adapt their choices based on their 
past experience. In downstream reciprocity — also called 
reputation-based indirect reciprocity — an individual as- 
signs a reputation to the others taking into account how 
they interact with rest of the society [f| [l5l - [l8l ] . These 
reputations allows her to decide whether she should help 
these individuals or not in potential future encounters 
with them. Accordingly, downstream indirect reciprocity 
is a cognitively very demanding task: it requires observa- 



tion, memory and communication. It is this reputation- 
based indirect reciprocity that will be the focus of the 
present work. 

Two different kinds of models of reputation-based indi- 
rect reciprocity have been considered in the literature. In 
indirect observation models [l6| each action is observed 
and judged only by one individual, who spreads this in- 
formation across the population through verbal commu- 
nication and gossip. Therefore all individuals share the 
same opinion about each other. On the contrary, in di- 
rect observation models [13, E^, H(| everyone witnesses 
the action and makes her private judgment of it. Thus 
individuals' different opinions about the rest of the mem- 
ber of the society can coexist in this kind of models. 

Ohtsuki and Iwasa [l6| and Brandt and Sigmund [TtJ 
have proposed a classification of the different strategics 
in games with indirect reciprocity through their assess- 
ment and action modules. Strategics can be classified ci- 
ther as second order or as third order strategies. In both 
cases, the reputation is assigned taking into account the 
observed action and the reputation of the individual who 
received it. But third-order strategies also look at the rep- 
utation of the individual who performs the action. The 
dynamics of second-order assessments has been explored 
in (2l|. Ohtsuki and Iwasa [n| also studied systemati- 
cally the evolutionarily stability of third order strategies. 
Their model is an indirect observation model and there- 
fore the whole society shares the same moral assessment. 
Stability is studied by confronting strategies with differ- 
ent action rules. They concluded that there are eight 
strategies — the so-called leading eight — which are evolu- 
tionary stable strategies (ESS) under these assumptions. 
The meaning and success of these strategies has also been 
studied by Ohtsuki and Iwasa (22[. On the other hand, 
Uchida and Sigmund [23| have chosen some of the leading 
eight strategies that share the same action rules but have 
different moral assessment and have confronted them in 
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a model with private opinions. 

In this work, we extend the systematic study carried 
out by Ohtsuki and Iwasa confronting strategies with dif- 
ferent moral assessments. Unlike their work, we use a 
direct observation model in which individuals no longer 
share the same opinion about the rest of the population. 
We introduce the concept of coherence as a measure of 
the relation between the moral assessment and the action 
rules of a strategy and study how it relates to the stabil- 
ity and efficiency of the strategics. We identify which 
strategics resist the invasion of all the other strategies, 
i.e., which combinations of moral assessment and action 
rules emerge under this evolutionary competition. Fi- 
nally we explore the effect that an action is witnessed 
by nobody in the population. Individuals can then face 
the risk to cheat — i.e., defect regardless of the opponent's 
reputation — at no own reputation cost. 

The present paper is structured as follows. In sectionim 
we introduce the model. In section IIIII we describe its 
mathematical implementation. We study homogeneous 
populations and discuss their stability against invasions 
by other strategics. We also analyze the effect on in- 
troducing a probability of cheating, when actions have a 
chance not to be witnessed. Finally, our results are shown 
in section [TV] and discussed in section [Vj 



II. MODEL 

Brandt and Sigmund [l7| introduced a very stylized 
model of indirect reciprocity based on reputation, and 
Ohtsuki and Iwasa [lfl, [22[ investigated the stability of 
its strategies under the assumption that all individuals 
share the same moral judgment. 

The model we will be dealing with in this work is a 
slight modification of this basic model. It consists of an 
infinite, well-mixed population, of interacting and judg- 
ing individuals. Every time step a pair of individuals are 
randomly and equiprobably drawn from the population. 
One of them plays the role of the donor and the other one 
of the recipient. The donor then decides whether to pay 
a cost c > to help (C) the recipient or not (D). If the re- 
cipient is helped, she receives a benefit b > c. This action 
is observed by every individual of the population (includ- 
ing themselves). Observers privately judge the donor for 
the action taken on the recipient according to their own 
moral assessment, and assign her a reputation — either 
good (G) or bad (B) — accordingly. Therefore every in- 
dividual in the population has a private opinion of every 
other individual, including herself. 

This process is repeated until the population reaches 
an equilibrium (we will define this equilibrium in more 
precise terms in the next section) . Then the average pay- 
off that every individual receives in this repeated game is 
computed. Direct reciprocity is excluded from this game 
because the population is virtually infinite — hence the 
probability that two people meet again is negligible. 

We consider third order indirect reciprocity, i.e., each 
strategy is described by two moduli: the action rules and 
the moral assessments. 

The action rules determine what the donor must do 
(either help or refuse to help) given the reputation of 
both players. Specifically, ai a p = 1 (C) if strategist i with 
reputation a helps an individual with reputation f3 (both 



according to Vs moral judgments) and (D) otherwise. 

The moral assessments tells the individual if the action 
just witnessed should be judged as good or bad, hence 
revising the donor's reputation. Specifically, m ia fi{a) = 
1 (G) if strategist i assigns good reputation to a donor 
previously judged a by i, who performs an action a on an 
recipient previously judged /3 by i, and is (B) otherwise. 

Thus each strategy is defined by 12 numbers: 4 for the 
action module and 8 for the moral module. This amounts 
to 4096 different possible strategies. 

We will assume that players sometimes make mistakes 
when trying to help another individual [H, [l9|, [24|-[26jj. 
Thus, with a probability a donor always defects be- 
cause of a lack of resources (poverty) and with 1 — t\ 
she performs the action she planned to. Another source 
of errors is misjudgment, i.e., and individual can make a 
mistake in interpreting the action. In this category lies 
social pressure. This is a kind of error that is especially 
important if the information on the action performed is 
spread by gossiping, because then, a misjudgment of the 
witness will lead to a misjudgment of the entire popula- 
tion. Otherwise, it affects only a small fraction of the in- 
dividuals. Since keeping track of errors may lead to a pro- 
liferation of judgments — even between individuals shar- 
ing the same moral assessment — and render the model 
computationally unfeasible, we will content ourselves by 
implementing only errors in the action. 

III. MATHEMATICAL IMPLEMENTATION OF 
THE MODEL 

A. Homogeneous populations 

Let us start by assuming that there is only one strat- 
egy i present in the population. Let Xi be the fraction 
of individuals considered good by the whole population 
(there is a unique moral assessment). Then the rate of 
change of Xi is given by 

^ = ^2x a (x l )X0(x l )P l , a l3 ~ Xi, (1) 

aji 

where Pi ia /3 is the probability that a donor of reputation 
a acting on a recipient with reputation (3 is considered 
good by the population. This probability can be obtained 
as 

Pi,a0 = (1 - £A)mi a p(a ia p) + e A m iaf) {D) (2) 

because with probability ea no help is provided and with 
probability 1 — €a the action performed is ai a p, as pre- 
scribed by the action module. We have also introduced 
the auxiliary function x~/(xi), 

X~f( x i) = l x i + C 1 -7)(! - x i), ( 3 ) 

which in this case represents the fraction of individuals 
with reputation 7. 

The dynamics reaches an equilibrium when xi = 
Haf3Xa{xi)Xi3{.Xi)Pi,ap- Therefore the fraction of good 
individuals in a homogeneous population in equilibrium 
is the solution ^ x% ^ 1 of the quadratic equation 
F{xi) = 0, where 

F(xi) =x 2 i (Pi,u + P t ,oo - Pi,w - Pi,oi) (4) 

+ Xi{Pi,W + -Pi, 01 — 2Pi,00 — 1) + PifiO- 
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As F(0) = P lfi0 > and F(l) = P ijU - 1 < 0, there is 
always a solution in [0, 1], but in some cases there may be 
two (when P^oo = or Pn = 1 or both), one stable and 
one unstable, and there is a degenerate case (when all 
coefficients in F(xi) vanish) in which any Xi is a solution. 
In this latter case, adding a small error, e m , in the moral 
assessment determines uniquely a stable solution. When 
the population is homogeneous this can be done at no 
computational cost by simply replacing Pi >a p m Eq. (@| 
by (1— 2e m )Pi >a 0+e m . This yields the expression F{xi) = 
e TO (l — 2xi), whose only root is Xi = 1/2, regardless of 
e m . Hence we take this solution — which holds even in the 
limit e m — > — as the solution of this degenerate case. 

Given the equilibrium fraction xm, the probability 
that an individual helps another is 



9iH = (1 - £q) ^ Xa(XiH)Xp( x tH)aial3- 



(5) 



Therefore the average payoff that any individual in this 
population obtains is 



TT"; 



H 



(b - c)6 tH . 



(6) 



As the whole population shares the same strategy, it can 
be regarded a measure of 'self-efficiency'. This provides 
a mean to classify strategies. 

Coherence provides an alternative classification crite- 
rion. Given an action a that a donor with reputation a 
performs on a recipient with reputation (3, we call an indi- 
vidual coherent if placed on the donor's feet she performs 
the same action a when she morally assesses it as good, 
and the opposite action 1 — a when she morally assesses 
it as bad. In other words, an individual is coherent if 
she performs actions that she judges as good and do the 
opposite of actions that she judges as bad. Thus we can 
introduce a coherency index h as 

hj = 77 [l-\m ia p(a) - 5(a,a ia/3 )\]xa (x iH ) Xp ( x iH) , 

a/3 a 

(7) 

where S(x,y) = 1 if x = y and otherwise. This index 
can range from (no coherence) to I (full coherence). 
Notice that the coherence of a strategy can change when 
more strategies are present in the population, because 
it depends on the fraction of good and bad individuals. 
Nevertheless, for the sake of classification, we have de- 
fined this index for a homogeneous population so that it 
is uniquely determined by xm , and therefore is an intrin- 
sic feature of each strategy. 



B. Stability of strategies 

Consider now a homogeneous population where indi- 
viduals share the same resident strategy. From time to 
time a small fraction of the population can adopt a new 
mutant strategy. This mutant strategy will eventually 
invade the resident population if mutants obtain a higher 
payoff than residents. 

Calculating these payoffs requires to compute the four 
fractions of individuals that are considered good and bad 
by the first and the second strategy in equilibrium. In 
the limit where the fraction of mutants is very small both 
residents and mutants interact only with residents. The 



dynamics of these four fractions of individuals is given in 
this limit by the equations 



dxt A2 
dt 



dXn 



E„a 1 a 2 Bifi 2 pAiA 2 
X l X l ^1,01(81,0; 



&-^ lAa > (8) 



P1P2 



dt 



Ea t a 2 pAiA 2 
x 2 x l r 2,a x fi 1 



_ Ai A 2 



(9) 



aia 2 
/81/82 



where xf lA2 are the fractions of i-strategists (i = 1 for 
residents and i = 2 for mutants) who are judged Ai by 
residents and A2 by mutants; i 5 /^^ a2 p is the probabil- 
ity that an i-strategist with reputation ct\ for residents 
and «2 for mutants, acting on a recipient of the resident 
population with reputation (3\ for other residents and (3% 
for mutants, is judged Ai by residents and A2 by mutants. 
The form of this probability is 



x <5(A 2 , m 2Q2( 9 2 (ai Qlft )) 
x 6(A 2 ,m 2a2 i3 2 (D)). 



(10) 



Equations ([5J and (0 can be simplified in the equi- 
librium. Nonetheless some of the equations need to 
be numerically solved (see Appendix A). To this pur- 
pose we must start from a sensible initial condition. 
We will assume that just before the invasion begins, 
all individuals — both mutants and resident — share the 
same opinion about everybody. The rationale for this 
choice is that, before the change of strategy undergone 
by mutants takes place, the population was homoge- 
neous. Therefore xf G (0) = x%h, xf B (0) = 1 — xm and 
xf B (0) = xf G (0) = 0. 

Once the fractions in equilibrium xf 1 * 2 are known, the 
probabilities Oij that an i-strategist helps a j-strategist 
= 1, 2) are obtained as 



h,j = (1 - £a) Xa(xf*) Xp(x G *) a la p, 



a a 



02,j = (1 - £ a) ^2 X«(- T 2 G ) Xn{x* G ) a 2a p, 



(11) 



where we have introduced the short-hand notation x G * 



„GA 2 



and 



Sai 



AiG 



to denote the sum over 



a given reputation. Obviously, xf* (x* G ) is the fraction 
of i-strategists that are judged as good by the resident 
(mutant) players irrespective of the mutant's (resident's) 
judgement. 

Finally, the average payoff W(i|j) that an i-strategist 
receives from a j-strategist can be computed as 



W{i\j) 



(6-c)i 

be, a - 



1 = j, 



(12) 



The resident population cannot be invaded by the mu- 
tants if W{1\1) > W(2\l) or if W(l|l) = W(2\l) and 
W(l|2) > W(2|2). If the resident strategy resists the in- 
vasions of all the other mutant strategies it is considered 
evolutionarily stable. 
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C. Stability against cheating 

Consider now that a situation in which actions are not 
always witnessed; instead, there is a chance that it passes 
unnoticed by the rest of the population. In this situation 
individuals may have the temptation to cheat by defect- 
ing in situation in which cooperation would be required 
to keep a good reputation. The appearance of this kind of 
mutation introduces a new set of strategies, parameter- 
ized by the cheating probability p c h, which might render 
unstable strategies that would otherwise resist invasions. 
The stability will of course be a function of the probabil- 
ity that the action is witnessed, pdis- 

To address this issue let us consider that residents de- 
cide to cheat with a probability p c h.i and mutants do 
so with a probability p c h,2, in the hope that they are 
not discovered. However their cheating will actually be 
discovered with a probability pdis- Assuming the same 
moral assessments and action rules for all individuals, 



..ch 
J 1H 



G* 
^2 



„ch 
h 2 ) 



where the frac- 
tions xfjj and xf 1 are calculated as above from Eqs. Q 
and (|A3|) , but incorporating the probability of being dis- 
covered if they cheat. Likewise Pi ta p is replaced by 



P i,a/3 =(1 - - Pdi s Pch,i)m la p(ai a (i) 

+ {(-A + PdisPch.i - £APdi s P c h,i)m ia p(D). 



(13) 



Finally, the probabilities of cooperation [c.f. Eq. l(TT|l ] 
are modified as 

= (1 - PcM)(l - <*) E X a {x? H ) X p{x? H ) a la p, 

a/3 

01% = (1 - PcM)(l - *a) E X*(x?h)xp(x?) a la p, 

a/3 

Oti = (1 - Pch, 2 )(l - e A ) E Xa{x?)Xp{x? H ) aw, 

a/3 

^2 = (1 - Pch, 2 )(l - e A ) E Xa(x?)xp(x?) a 2aP . 

a/3 

(14) 



IV. RESULTS 
A. Stability of strategies 

Our aim is to identify strategics that are evolutionarily 
stable. In principle this requires for every strategy to 
check whether it can invaded by every other strategy. 
However the number of pairs of strategies is larger than 
1.5 x 10 7 , so this becomes too demanding a computational 
task. Accordingly we proceed in two steps: (i) we look 
for all strategies that are stable against invasions by other 
strategics sharing the same moral assessment; and (ii) we 
study the stability of these selected strategics against all 
the remaining ones. 

Our Eqs. (j8]) and ([9]) reduce to those used in Ref. fl6l ] 
if we fix the moral assessments and neglect moral errors. 
We carried out our analysis for different values of the 
action error e A (0.1, 0.01 and 0.001) and benefit-to-cost 
b/c ratio (1.2, 1.5, 2 and 3). 

In Fig. [T] we represent the strategies that are stable 
against invasions by all strategies sharing the same moral 



assessment, as a function of their normalized average pay- 
off Wh = Wh [(b — c)(l — £a)] _1 and their coherence. 
These strategies always appear in pairs since there is a 
symmetry in the reputation: if labels "good" and "bad" 
are exchanged the results arc not affected (see [l|| for 
more details). Notice though that there is symmetry only 
in the moral assessment but not in the action. The reason 
is that cooperating and defecting are not just labels be- 
cause they have consequences in the payoffs obtained. It 
is easy to show, using Eq. (0 that the sum of the coher- 
ences of a strategy and its "mirror" strategy is always 1. 
Coherence thus provides an external assessment on moral 
labels, breaking the symmetry and permitting to differ- 
entiate between a strategy and its mirror. In Fig. [1] we 
only show the results for the coherent strategy (h > 0.5) 
of the pair and report how many pairs N p are shown. 

Figure [1] shows that the larger the benefit-to-cost ra- 
tio, the higher the number of stable strategies; in other 
words, it is difficult to break into a population whose indi- 
viduals obtain high rewards for help. Moreover, we have 
counted the number of pairs of strategies in which each 
strategy can be invaded by the other — i.e., at least one 
mixed equilibrium is formed. The number of these pairs 
also appears to be larger the higher the benefit-to-cost 
ratio (2500-2600 pairs for b/c = 3 vs. 1500-1700 for the 
remaining cases). Therefore even if a mutant invades a 
resident strategy, it is less likely that it eventually dom- 
inates the population if b/c is high. From Fig. [T]we also 
conclude that poverty (high e A ) allow invaders to spread 
easier in the resident population. 

On the other hand, payoff and coherence seem to be 
correlated. Specifically, stable strategies with high pay- 
off are highly coherent (incoherent for the their mirror 
strategies). In Table U we list all coherent stable strate- 
gies along with their payoffs for b/c = 2 and e A = 0.01. 
Most of them coincide with those found by [l6j]. There 
are some minor differences though because we are using 
slightly different models. The eight strategies with the 
highest payoff correspond to the so-called Leading Eight 
[Lty . These strategies are present in all cases shown in 
Fig. [T] All stable strategies have some common features: 

(i) not helping good individuals is always considered bad, 

(ii) good individuals never help bad ones, (iii) good indi- 
viduals always help good individuals — except when errors 
occur — and that is judged as good. (There are two strate- 
gies for which the last red feature is quite the opposite, 
but they receive rather low payoffs.) 

Notice that the absence of errors in the moral assess- 
ments renders all defective strategies (strategies that al- 
ways defect) vulnerable to invasions. Ohtsuki and Iwasa 
jig found that all defective strategies were stable; the 
reason is that although these strategies never reward, er- 
rors in judgments provide them some payoff. This does 
not happen in the present model. Thus defective strate- 
gies are no longer stable. 

Once identified the strategies that cannot be invaded 
by others with the same moral assessments, we study 
which of them are actually stable against the invasion 
by any other strategy. We have found that all those 
strategics remain stable even if strategics with different 
moral assessments try to invade them. Besides, we have 
also checked that strategies that can be invaded by other 
strategies with the same moral assessment can be in- 
vaded by some strategies with different moral assessment 
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FIG. 1. Representation of the normalized average payoff Wh as a function of the coherence h for the stables strategies. Only 
the coherent strategy (h > 0.5) of each pair is represented. Different panels show results for different e.\ and b/c. 



as well. 



B. Robustness against a rumor spreading 

We have checked sensitivity of these results with re- 
spect to a different choice of the initial conditions to solve 
Eqs. ([5]) and ©. In Sec. IIII 51 we made the assumption 
that, before a mutation occurs, all individuals share the 
same opinion about everybody because the population is 
homogeneous. But other choices are possible based on 
different assumptions. One of them, that leads people 
to have different opinions, is justified by the purported 
spread of a rumor. Rumors can spread misjudgements 
and lead a fraction of the population to disagree from the 
general opinion. This choice for initial conditions may be 
modeled as 

xf G (0) = (1 - e B ) XiH , 

xf B (0)=e B x iH , 

x BB (0) = (l-ef)(l-x iH ), 

X BG (0)=ef(l-X iH ), 

where e B (ef) is the fraction of individuals that are mis- 
judged as bad (good) by the mutants due to the rumor. 



Note that if e B — ef — the whole population agrees 
in its judgments and we recover the former initial condi- 
tions. 



Depending on the (small) values of e B and e^ 3 , we 
have checked that the initial conditions (|T5|) may lead 
to three different scenarios. In the first one xf G = xm, 
x BB = 1 — XiH and xf 3 = xf G = 0, so that the rumor 
fades away and we recover a homogeneous population. 
In the second scenario the rumor remains or even grows 
{xf and xf B decrease and xf B and x BG increase), but 
the payoff obtained by the mutants is lower than that ob- 
tained by the residents. Consequently the mutants are ex- 
pelled and a homogeneous population is restored. In the 
third scenario the rumor also remains and the mutants 
obtain higher payoffs than the residents, so that the ru- 
mor eventually spreads. Wc have found that around 850 
strategies lie in this last case (considering differences be- 
tween mutant's and resident's payoffs higher than 10~ 6 ) 
when b/c = 2 and ca = 0.01. Fortunately none of these 
strategies belong to the group of the stable ones, so this 
rumor spreading does not affect the evolutionary fate of 
the population. 
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c 


D 


0.9901 


lid 


G 


13 


13 


G 


G 


13 


13 


G 


c 


D 


c 


D 


0.9901 


Ilia 


G 


13 


G 


C4 


G 


13 


13 


13 


c 


D 


c 


D 


0.9900 


Illb 


G 


13 


13 


G 


G 


13 


13 


13 


c 


D 


c 


D 


0.9900 




Vt 


Ta 


R 
13 


R 
13 


Vt 


R 
13 


Vt 


R 
13 


c 




c 


c 


0.9135 




a 

Vt 


13 


13 


13 


a 

Vt 


13 


a 

Vt 


a 

Vt 


c 


D 


c 




0.9049 




Vt 


13 


13 


13 


a 

Vt 


13 


13 


a 

Vt 


c 


D 


c 


D 


0.9049 




n 

Vt 


13 


R 
13 


r< 

Vt 


R 
13 


R 
13 


Vt 


R 
13 


c 


D 


1) 


c 


0.8340 




a 

Vt 


13 


Vt 


Vt 


13 


13 


a 

Vt 


13 


c 


D 


13 


c 


0.8340 




G 


13 


13 


G 


13 


13 


13 


G 


c 


D 


D 


D 


0.8264 




G 


B 


B 


G 


B 


B 


G 


G 


c 


D 


D 


D 


0.8264 




G 


B 


G 


G 


B 


B 


B 


G 


c 


D 


D 


D 


0.8264 




G 


B 


G 


G 


B 


B 


G 


G 


c 


D 


D 


D 


0.8264 




B 


B 


B 


G 


G 


B 


B 


B 


D 


D 


C 


D 


0.2500 




B 


B 


G 


G 


G 


B 


B 


B 


D 


D 


c 


D 


0.2500 



TABLE I. Coherent stable strategies and their normalized average payoffs Wh for the case b/c = 2 and 6a = 0.01. The top 
eight strategies (labeled la through to Illb) are the so-called Leading Eight [l6|]. They are the ones with the highest payoffs 
among all the stable strategies obtained for a given benefit-to-cost ratio {b/c). 



C. Stability in the presence of cheating 

Wc have also studied the stability of the leading eight 
strategies against the invasion of cheaters. We divided the 
leading eight strategies in Groups I, II and III as function 
of its different behavior (as it was done in [l6]). Figure [5] 
represents the limiting p*j is below (above) which mutants 
who cheat with a higher (lower) probability than residents 
can invade. In Appendix B we calculate analytically the 
shape of this curve in the limit €a — > 0. Figure [2] shows 
that below the curve p*^ s {p c h) cheating increases with- 
out bound through subsequents invasions until the whole 
population is dominated by defectors. In other words, if 
cheating occurs and the probability of being discovered 
is not high enough, none of the leading eight strategies 
survives. In particular, if pdis < c/b full defection is the 
unavoidable fate of the population. Thus, if only small 
mutations are allowed in a honest population, wc find the 
Hamilton-like rule 6pdis > c for the survival of coopera- 
tion 

Increasing makes it even easier for cheaters to in- 
vade, with the exception of the strategies belonging to 
Group III, which seem to be insensitive to the effect of 
errors in action. 



V. DISCUSSION 

We have carried out a systematic study of the stabil- 
ity of all possible third-order indirect reciprocity strate- 
gies. We extended the work of Ohtsuki and Iwasa [l6| 
confronting all the strategies against the others regard- 
less of whether they have the same moral assessments 
or not. The main difference with their model is that in 
ours individuals directly witness all actions. Allowing in- 



dividuals in the same population to have different moral 
assessments and action rules makes indirect observation 
models computationally unfeasible (we must store every- 
body's opinion of everybody else at every time step). For 
the same reason, errors in judgments cannot be accounted 
for in direct observation models. Thus we only consider 
errors in performing the actions. The only exception to 
this assumption is the need to introduce errors in judge- 
ment to calculate, in some special cases, the stationary 
fractions of good and bad individuals in homogeneous 
populations. But this is just a technical issue that allows 
us to resolve a degeneracy of solutions, and there is no 
inconsistency because the results do not depend on the 
value of this error. 

The strategies which are stable against invasions by 
other strategies sharing the same moral assessment turn 
out to be also stable against invasions by any other strat- 
egy. This means that if a strategy can resist the invasion 
of all the other strategies that share its same moral as- 
sessment, it can resits any invasion whatsoever. 

We have checked that the higher the benefit-to-cost ra- 
tio and the lower the action errors the higher the number 
of stable strategies obtained. An increase of the num- 
ber of errors in action can be interpreted as a measure 
of the poverty of the population, since these errors arise 
from lack of resources. We have shown that scarcity of 
resources favors invasions. On the other hand, we have 
checked that populations whose members receive more 
benefit for a given cost are more resistant to invasions. 

As pointed out in Ref. [Tf|, there is a symmetry be- 
tween the moral assessments of the strategies. Good and 
bad are just labels with no proper meaning — in contrast 
to actions, that have a direct influence in the payoffs. In 
order to break that symmetry and provide a meaning to 
those labels we have introduced the concept of coherence. 
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FIG. 2. Limit curves of p^is as a function of p c h that divide the regions where the cheating can be increased (above the curves) 
and decreased (below the curves) through the invasion of mutants with different p c h- Different types of lines represent different 
values of €a'- 0.1 (continuous), 0.01 (dashed) and 0.001 (doted). Different panels show results for different groups of leading 
eight strategies and b/c. 



Coherence links moral assessments with action rules. We 
have shown that stable strategics appear in pairs due to 
the above mentioned symmetry, but coherence values are 
complementary. This allows us to choose only one of the 
strategies (the most coherent) within each pair for later 
analysis and interpretation. 

The stable strategies we obtain include the Leading 
Eight found by Ohtsuki and Iwasa [l6| . These are also the 
most efficient ones (those with highest payoffs). Both the 
Leading Eight as well as the remaining stable strategies 
that we have obtained share some features, and except 
for the two least efficient strategies (with Wh = 0.25), 
all of them obtain high average payoffs (Wh > 0.8). 
They identify defectors (togg(-D) = vtibg = B) and, 
except the two least efficient strategies, maintain coop- 
eration (cigg = C an d itigg(G) — G). All of them 
punish defectors (cigb = D), although three of the sta- 
ble strategies (with Wh ~ 0.9) do not judge this as a 
good behavior. Finally the most efficient stable strate- 
gies (Wh > 0.9) forgive bad individuals who help good 
players (msG(C) = G and asc = C). The more of 
these features the strategies follow the higher their pay- 
off. Thus the three strategies with Wh ~ 0.9 turn good 



punishers into bad individuals and they can only restore 
their reputation by helping good individuals. And in the 
case of strategies with Wh < 0.9, bad individuals cannot 
increase their reputation by helping good players, but 
only by interacting with other bad individuals. 

We have also found that all these strategies may be- 
come unstable if cheaters arise. If the probability of wit- 
nessing a cheat is not high enough, cheaters can take over 
a honest population. Upon increasing the cheating prob- 
ability pdis > c/b the population eventually turns into 
pure defectors. Interestingly, the condition for a popula- 
tion to resist this effect is of the Hamilton type, namely, 
bpdis > c, where b is the benefit and c the cost. Errors in 
action make this condition even more restrictive for the 
stability of a honest population. 

Cheating is always a danger for cooperation based on 
indirect reciprocity. Even in societies where this mech- 
anism is of utmost importance cheating always threats 
honest behavior. For instance, the (now extinct) Patag- 
onian tribes of the Yamana are among the reported so- 
cieties more strongly based on indirect reciprocity [28| . 
Sharing food even with nonrelatives appeared to be the 
default behavior. Not sticking to it brought a bad rep- 
utation and severe social pubishment (e.g., not partici- 
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pating in further food sharing). Yet, cheating among the 
Yamana was reported to occur when chances were low 
to be discovered (for instance, because the prey obtained 
was easy to hide; see Ref. [28|, p. 197). 

One of the problems that emerges from considering dif- 
ferent moral assessments is the possibility that the frac- 
tions of good and bad individuals may depend on the 
initial setup. We sort out this issue by choosing realistic 
initial conditions for the differential equations describing 
the evolution of these fractions. Essentially, we assume 
that mutations do not change the previous judgments 
that individuals had on each other. This notwithstand- 
ing, we have analyzed other initial conditions in which 
not all individuals have the same opinion. A typical setup 
where this might happen is when a rumor is spread over 
a fraction of the population. We have checked that, al- 
though misjudgement can survive or even spread over a 
larger fraction of the population, it eventually disappears 
because mutants with a wrong judgement get less payoff 
than residents who use one of the stable strategies. 

Admittedly, in order to carry out such a systematic 
analysis as we have performed here, we have had to sac- 
rifice some realism in the model. On the one hand, 
we have considered that reputation can only have two 
states: good and bad. This binary reputation have been 
used in several preceding studies [ly, [23| and implies that 
only the actions that happen in the last round are taken 
into account to assign reputation. However, Tanabe et 
al. [291 ] have studied a model with trinary reputations 
and showed that some strategies (like the so-called image 
scoring) can be stable in a trinary-rcputation model but 
not in a binary-reputation one. On the other hand, we 
have considered that every player has complete informa- 
tion of every single interaction in the population (except 
when we introduced cheating). This is too strong an as- 
sumption and some studies discuss the effect of a lim- 
ited access to the information (see [3(| and the references 
therein). 
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Appendix A 

The two sets of Eqs. © and © can be simplified in 
the steady state dx/dt = 0. Thus, summing over the 
reputation A2 in Eqs. ([8]) we obtain 



A 2 



Xih- 



(Al) 



Therefore we can reduce Eqs. © to just two equations 
in two unknowns (e.g., x GG and xf B ) by setting 



t gb 
x i 

x 1 



„G* 



„GG 



1 



„G* 



„BB 



(A2) 



The two remaining equations from ([8j have to be solved 
numerically using the initial conditions discussed in 
Sec. lETBl 

On the other hand, the set of Eqs. © is decoupled 
from the set ©, and so they can be solved analytically 
after solving the latter. This is easier if x 2 G is calculated 
first, 



x 2 



K G P 2 ,oi + (1 - xf)P 2m ] 
x [l + ^ G (P 2 ,oi-P 2 ,n) 

+(l-^ G )(P 2 ,oo-P 2 ,io)] _ 



(A3) 



Hence Eq. (J9j> reduces to a linear system of two equations 
in the two unknowns x 2 G , x 2 B ). 



There are scenarios where the solution of x 



Ai A 2 



turns 



out to be degenerated. In these situations the set of 
Eqs. ((9]) need to be integrated along with the set of 
Eqs. ©. 



Appendix B 

Consider a resident population whose individuals play 
one of the leading eight strategies with probability 1 — 
Pch,i but defect otherwise. Consider mutants who do the 
same, but with a probability 1 — p c h,2- For simplicity let 
us assume the limiting case €a ~ > 0. Applying adaptive 
dynamics [3l| , the curve separating the regions where the 
mutant can or cannot invade the population is given by 



dW(p c h,2,Pch,l) 



dp 



'ch,2 



0, 



(Bl) 



Poh,2=Pch,l 



where the payoff W(p c h,2>Pch,i) is equivalent to W(2|l) 
According to Eq. (|T2"j) . 



dW(p ch , 2 ,Pch,i) ,^1,2 



dpch.2 



dp c h,2 dpefro 



(B2) 



To go further we need to separate the strategies of the 
three groups. 



1. Group I strategies 



Using Eqs. (fT4|) for the leading eight strategies, the 
probabilities of cooperation 6fj are 



0?2 =(1 - PaM) N h + (1 - <fi )(1 - 



^l=(l-Pch >2 ) + 

Thus 

dxt 



..eh 



)]• 



(B3) 



dp c h,2 

dOti 

dp c h,2 



= (1 -Pch,l) X UH 



dp c h,\ 



= (1 - s&r) 



- (1 -Pcm) 



dx% 
dp, 



(B4) 



- 1. 



ch,2 
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The fractions xf 1 H and x?, h are obtained from Eqs. (Q| 
and JSSI- To that purpose we need to substitute 



P$i =^oi = l- PdisPch,, , P$ = l (B5) 



and 



^.00 — ^ PdisPch, i* 



(B6) 



Thus iEf 1 ^ is the solution of 



PdisPch,i(>i h ff ) 2 = (1 - PdisPch,i)(l - SG^r), (B7) 
and once it is obtained, 

1 - PdisPch,2 

Pch.2 ' 

(B8) 



x 2 



1 - (1 - zf^PdisPch^ 



dxf 



ch 

PdisXifj 



dpch,2 [1 - (1 - xl h H )p dis p chi2 } 2 ' 

Substituting into (|B4|) and setting p c h,2 = Pch.i = Pch 
yields 



ch 



(1 -Pch,i)Pdis(xf ff ) 2 



dp c h [1 - PdisPch(l - a^)] 2 ' 
^2 h i _ ^i h g [Pd is (l-^)- 1 ] 

dpch,2 [1 - PdisPch(l - xl h H )] 2 ' 

Therefore pj is is the solution of the system 

p* dis [b(l-Pch)x* + c(l-x*)}=c, 
P*disPch(x*) 2 = (1 -PdisPch)(l - a;*)- 



(B9) 



(BIO) 



2. Group II strategies 

For the strategies of this group 

0t% = (1 - Pch.l) ^f, ^!\ = (1 - Pch, 2 ) X^ H , (Bll) 



hence their derivatives are 

= (1 -Pch,l) 



dp, 



ch.2 



dp, 



ch,2 



dpch,: 



= -xf. (B12) 



Probabilities P?^ are now given by (|B5[) as well as 
i^h Q = 1. Thus, after Eqs. © and flM}, 



1 + PdisPch, 1 



xf = l-x? H p dis p chi2 . (B13) 



Substituting into (|B12|) and setting p c h,2 = Pch,i = Pch 
yields 



d®\.2 (1 -Pch)p. 



(lis 



dPch,2 1 + PdisPch 



1 



dPch,2 1 + PdisPch ' 



(B14) 



and therefore 



Pdis 



b(l - Pch) 
3. Group III strategies 



(B15) 



For the strategies of this group the probabilities of 
cooperation and their derivatives are also given by 
Eqs. (|BTT|) and (|B12|) . and the probabilities by (JB5) 
as well as Pf$ = 0. Thus, after Eqs. g]) and (TMj) . 



xf^H = 1 - PdisPch, 1, stff = 1 - PdisPch,2- (B16) 



ch 



Substituting into (|B12[) and setting p c h,2 = Pch,i = Pch 
yields 



d0t h 2 

dpch,2 

dp ch, 2 



= - (1 -pch)Pdis, 
= - (1 - PdisPch), 



(B17) 



and therefore 



Pdi, 



CPch +6(1 - Pch) 



(B18) 
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